Skip to content

Conversation

@lovesegfault
Copy link
Member

@lovesegfault lovesegfault commented Oct 1, 2025

Motivation

Fixes: #14130 (maybe?)

Context

Claude-aided debugging:

  1. A derivation has multiple outputs (e.g., out and dev)
  2. One output (out) is already valid in the store
  3. Another output (dev) is NOT valid
  4. User requests to build the already-valid output (out)
  5. Since not ALL outputs are valid, DerivationGoal cannot return early—it must proceed with the build
  6. DerivationGoal delegates to DerivationBuildingGoal to rebuild the entire derivation
  7. In registerOutputs() (derivation-builder.cc:1383-1388), already-valid outputs are skipped to avoid unnecessary re-registration
  8. The wantedOutput (out) is skipped, so it's NOT added to builtOutputs
  9. Back in DerivationGoal, the code tries to filter builtOutputs to only contain wantedOutput
  10. Assertion fails because wantedOutput is not in builtOutputs

I'm still trying to write a reproducer to validate...


Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

@Ericson2314 Ericson2314 marked this pull request as draft October 1, 2025 21:24
@h0nIg
Copy link
Contributor

h0nIg commented Oct 9, 2025

@lovesegfault there is a second issue (duplicate for this). Is there a reason that this is still a draft?

#14172

@xokdvium
Copy link
Contributor

xokdvium commented Oct 9, 2025

Is there a reason that this is still a draft?

  1. No repro and unclear how this can occur
  2. It's AI slop and unclear if this fixes anything

@h0nIg
Copy link
Contributor

h0nIg commented Oct 9, 2025

Is there a reason that this is still a draft?

  1. No repro and unclear how this can occur
  2. It's AI slop and unclear if this fixes anything

@xokdvium can you tell me how much verbosity i need to activate, which does not output too much but still answers ericsons question in my ticket. I may be able to provide this once i reproduce it again (was not able to reproduce it locally but within a CI&CD)

Where you perhaps trying to build a specific output, and the derivation did not have an output with that exact name?

@xokdvium
Copy link
Contributor

xokdvium commented Oct 9, 2025

can you tell me how much verbosity i need to activate

debug would be ideal, but it's very noisy and contains curl logs that will leak potentially sensitive information. During the most recent call we've discussed some alternatives for providing more context to better debug this. #14194 might be helpful for this. I hope to finish that soon and backport it.

@h0nIg
Copy link
Contributor

h0nIg commented Oct 13, 2025

@xokdvium @lovesegfault this PR solves my problem, i cherry picked this change into 2.32.1 and the problem does not happen anymore. In addition I posted the code dump here: #14130 (comment)

@Ericson2314
Copy link
Member

@h0nIg see what i wrote on the issue. I am hesitant to do this except for as a last resort, it is as a real "shove the problem underneath the rug" fix.

@lovesegfault lovesegfault marked this pull request as ready for review October 31, 2025 01:16
@lovesegfault
Copy link
Member Author

We've started hitting this internally after I updated to 2.32, so I'm going to open this as it does seem to solve the problem.

Me and @xokdvium talked a bit about it today, and our best theory is that somehow there is a race to create the output, such as substitution vs building when only some outputs are cached, and then drv building races for that output.

@lovesegfault lovesegfault force-pushed the nix-debug-14130 branch 3 times, most recently from 8c7830b to c0534d7 Compare October 31, 2025 01:54
@Ericson2314 Ericson2314 enabled auto-merge October 31, 2025 01:58
@Ericson2314 Ericson2314 added backport 2.31-maintenance Automatically creates a PR against the branch backport 2.32-maintenance Automatically creates a PR against the branch labels Oct 31, 2025
@lovesegfault
Copy link
Member Author

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 31, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 31, 2025

Walkthrough

In the DerivationGoal::haveDerivation function within derivation-goal.cc, the code now checks whether a wanted output is missing from the success.builtOutputs map. If missing, it injects an entry for that output using the current outputHash and includes a validity assertion with a debug message. This change replaces the previous assumption that the entry would always exist in the map, addressing cases where outputs were already valid and not re-registered.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Verify that the injected entry parameters (outputHash, validity assertion) are correctly computed and appropriate for the edge case
  • Confirm the debug message accurately describes the scenario and aids in troubleshooting
  • Validate that this graceful handling resolves the assertion failure without masking other issues
  • Check that the fix applies to all relevant code paths where outputs may already be valid

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Title Check ✅ Passed The title "fix(libstore/build/derivation-goal): don't assert on partially valid outputs" is directly related to the main change in the PR. It accurately summarizes the primary objective of the fix: addressing an assertion failure that occurs when a derivation has partially valid outputs. The title is specific, clear, and uses appropriate scope prefixing following conventional commit practices, making it easy for reviewers to understand the core fix at a glance.
Linked Issues Check ✅ Passed The PR directly addresses the primary objective of issue #14130 by fixing the assertion failure described in the linked issue: "Assertion buildResult.builtOutputs.count(wantedOutput) > 0' failed." The code change in DerivationGoal::haveDerivation now handles the case where wantedOutput is missing from success.builtOutputs by manually injecting an entry instead of asserting, which matches the expected behavior of allowing nix develop` to complete successfully when some outputs are cached and others are not. This aligns with the core requirement of fixing the crash in the derivation goal handling.
Out of Scope Changes Check ✅ Passed All changes are scoped to the specific objective: modifying src/libstore/build/derivation-goal.cc to handle the case where wantedOutput is missing from builtOutputs. The change is focused on the exact assertion failure point identified in issue #14130 and does not introduce unrelated modifications. No alterations to exported or public entities are reported, and the fix is narrowly targeted at resolving the assertion in the DerivationGoal::haveDerivation function.
Description Check ✅ Passed The PR description is clearly related to the changeset and provides meaningful context about the problem being fixed. It includes a step-by-step scenario explaining how the assertion failure occurs: a derivation with multiple outputs where one is already valid, triggering a rebuild that skips re-registering the valid output, which then causes the assertion when the code expects the output to be in builtOutputs. The description effectively motivates the fix by linking to issue #14130 and explaining the underlying root cause.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 37c1ef5 and 9eecee3.

📒 Files selected for processing (1)
  • src/libstore/build/derivation-goal.cc (1 hunks)
🔇 Additional comments (1)
src/libstore/build/derivation-goal.cc (1)

281-297: The review comment's premise is incorrect: this is not masking a root cause, but handling expected behavior as documented.

The code appropriately handles a well-understood condition: when derivation outputs are already valid in the store, registerOutputs() intentionally skips re-registering them to builtOutputs. This is by design, not a bug. The comment at line 282 explicitly describes this: "was already valid and therefore not re-registered." The same pattern appears for bmCheck mode (lines 255-267), where the builder intentionally doesn't register outputs, requiring manual population.

That said, the implementation has valid observability concerns:

  1. Misleading message: "BUG!" suggests an unexpected condition, but this is expected behavior for partial cache hits. Change to "expected output was not registered by builder (already valid), adding manually" or similar.

  2. Debug-level logging: May not surface in production. Consider notice level with structured context (derivation path, whether this is a partial vs full cache scenario).

  3. No metrics: Operators can't track frequency or diagnose patterns without telemetry. Consider adding a counter.

Verify that assertPathValidity() (line 290) is the appropriate safety check for potentially-cached outputs before returning them.

Likely an incorrect or invalid review comment.


Comment @coderabbitai help to get the list of available commands and usage tips.

@Ericson2314 Ericson2314 added this pull request to the merge queue Oct 31, 2025
Merged via the queue into NixOS:master with commit 4a2fb18 Oct 31, 2025
19 checks passed
@internal-nix-ci
Copy link

Backport failed for 2.31-maintenance, because it was unable to cherry-pick the commit(s).

Please cherry-pick the changes locally and resolve any conflicts.

git fetch origin 2.31-maintenance
git worktree add -d .worktree/backport-14137-to-2.31-maintenance origin/2.31-maintenance
cd .worktree/backport-14137-to-2.31-maintenance
git switch --create backport-14137-to-2.31-maintenance
git cherry-pick -x 9eecee3d4e28634ef11d0044bb2e84bd8b13f2c7

@internal-nix-ci
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport 2.31-maintenance Automatically creates a PR against the branch backport 2.32-maintenance Automatically creates a PR against the branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

nix develop erroring out with core dump on 2.31.*

4 participants